Sanskrit Segmentation
نویسنده
چکیده
منابع مشابه
Word Segmentation in Sanskrit Using Path Constrained Random Walks
In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random ...
متن کاملDesign of a lean interface for Sanskrit corpus annotation
We describe an innovative computer interface designed for assisting annotators in the efficient selection of segmentation solutions for proper tagging of Sanskrit corpus. The proposed solution uses a compact representation of the shared forest of all segmentations. The main idea is to represent the union of all segmentations, abstracting on the sandhi rules used, and aligning on the input sente...
متن کاملAutomatic Sanskrit Segmentizer Using Finite State Transducers
In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 ...
متن کاملSanskrit Compound Processor
Sanskrit is very rich in compound formation. Typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components, discover the relations between them and finally generate a paraphrase of the compound. In this paper, we discuss the automatic segmentation and type identification of a compound using ...
متن کاملFormal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor
We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009